Skip to content
This repository has been archived by the owner on Apr 5, 2022. It is now read-only.

XD-3751 Fix gpfdist processor shutdown #1901

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jvalkeal
Copy link
Contributor

@jvalkeal jvalkeal commented Mar 3, 2016

  • Adding a workaround for a problem in reactor 2.0.x where
    onNext and onComplete will result deadlock if ringbuffer
    is full.
  • We now try to let gpdb load session to drain stream and detect
    if that succeed by checking buffer size and what's a remaining
    capacity. If we can't drain, last possible option is to
    force processor shutdown.

- Adding a workaround for a problem in reactor 2.0.x where
  onNext and onComplete will result deadlock if ringbuffer
  is full.
- We now try to let gpdb load session to drain stream and detect
  if that succeed by checking buffer size and what's a remaining
  capacity. If we can't drain, last possible option is to
  force processor shutdown.
caxqueiroz pushed a commit to caxqueiroz/jdbcgpfdist that referenced this pull request Mar 28, 2016
@markpollack markpollack self-assigned this Mar 29, 2016
if (greenplumLoad != null) {

// xd waits 30s to shutdown module, so lets wait 25 to drain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this a bit more - sounds like some sort of race condition, shouldn't we let the entire buffer drain since the messages in the buffer have been ack'd (say in rabbit) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deadlock within a reactor is something which exists in 2.0.x, although it fixed in 2.5. Effectively when trying to shutdown a processor, signal is sent into a downstream indicating its complete but if there's existing messages in a ringbuffer, that terminate signal never reach a correct component in a reactor because we already stopped draining. Module shutdown timeout is afaik hardcoded to 30 secs in XD and after that things go a bit haywire if module is not actually properly closed.

This was a workaround I came out with discussion with stephane. It rely on a fact that we try to keep the load operations running little less time when XD would throw errors that it's unable to shutdown a module. We're hoping that these load operations will eventually drain the buffers and allows terminate signal to go down stream, thus allowing processor to shutdown and thus allow clean shutdown of a module.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants